1,413 research outputs found

    PRESENCE: A human-inspired architecture for speech-based human-machine interaction

    No full text
    Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially and performance appears to be asymptotic to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. This paper addresses these issues and presents a novel architecture for speech-based human-machine interaction inspired by recent findings in the neurobiology of living systems. Called PRESENCE-"PREdictive SENsorimotor Control and Emulation" - this new architecture blurs the distinction between the core components of a traditional spoken language dialogue system and instead focuses on a recursive hierarchical feedback control structure. Cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of interaction in which the system has in mind the needs and intentions of a user and a user has in mind the needs and intentions of the system

    Computing phonological generalization over real speech exemplars

    No full text
    Though it has attracted growing attention from phonologists and phoneticians Exemplar Theory (e g Bybee 2001) has hitherto lacked an explicit production model that can apply to speech signals An adequate model must be able to generalize but this presents the problem of how to generate an output that generalizes over a collection of unique variable-length signals Rather than resorting to a priori phonological units such as phones we adopt a dynamic programming approach using an optimization criterion that is sensitive to the frequency of similar subsequences within other exemplars the Phonological Exemplar-Based Learning System We show that PEBLS displays pattern-entrenchment behaviour central to Exemplar Theory s account of phonologization (C) 2010 Elsevier Ltd All rights reserve

    A needs-driven cognitive architecture for future 'intelligent' communicative agents

    Get PDF
    Recent years have seen considerable progress in the deployment of 'intelligent' communicative agents such as Apple's Siri, Google Now, Microsoft's Cortana and Amazon's Alexa. Such speech-enabled assistants are distinguished from the previous generation of voice-based systems in that they claim to offer access to services and information via conversational interaction. In reality, interaction has limited depth and, after initial enthusiasm, users revert to more traditional interface technologies. This paper argues that the standard architecture for a contemporary communicative agent fails to capture the fundamental properties of human spoken language. So an alternative needs-driven cognitive architecture is proposed which models speech-based interaction as an emergent property of coupled hierarchical feedback control processes. The implications for future spoken language systems are discussed

    Talking with robots: opportunities and challenges

    Get PDF
    Notwithstanding the tremendous progress that is taking place in spoken language technology, effective speech-based human-robot interaction still raises a number of important challenges. Not only do the fields of robotics and spoken language technology present their own special problems, but their combination raises an additional set of issues. In particular, there is a large gap between the formulaic speech that typifies contemporary spoken dialogue systems and the flexible nature of human-human conversation. It is pointed out that grounded and situated speech-based human-robot interaction may lead to deeper insights into the pragmatics of language usage, thereby overcoming the current `habitability gap'

    The Sheffield Search and Rescue corpus

    Get PDF
    © 2017 IEEE. As part of an ongoing research into extracting mission-critical information from Search and Rescue speech communications, a corpus of unscripted, goal-oriented, two-party spoken conversations has been designed and collected. The Sheffield Search and Rescue (SSAR) corpus comprises about 12 hours of data from 96 conversations by 24 native speakers of British English with a southern accent. Each conversation is about a collaborative task of exploring and estimating a simulated indoor environment. The task has carefully been designed to have a quantitative measure for the amount of exchanged information about the discourse subject. SSAR includes different layers of annotations which should be of interest to researchers in a wide range of human/human conversation understanding as well as automatic speech recognition. It also provides an amount of data for analysis of multiple parallel conversations around a single subject. The SSAR corpus is available for research purposes

    Creating a voice for MiRo, the world's first commercial biomimetic robot

    Get PDF
    Copyright © 2017 ISCA. This paper introduces MiRo-The world's first commercial biomimetic robot-And describes how its vocal system was designed using a real-Time parametric general-purpose mammalian vocal synthesiser tailored to the specific physical characteristics of the robot. MiRo's capabilities will be demonstrated live during the hands-on interactive 'Show & Tell' session at INTERSPEECH-2017

    Using Alexa for flashcard-based learning

    Get PDF
    Despite increasing awareness of Alexa’s potential as an educational tool, there remains a limited scope for Alexa skills to accommodate the features required for effective language learning. This paper describes an investigation into implementing ‘spaced-repetition’, a non-trivial feature of flashcard-based learning, through the development of an Alexa skill called ‘Japanese Flashcards’. Here we show that existing Alexa development features such as skill persistence allow for the effective implementation of spaced-repetition and suggest a heuristic adaptation of the spaced-repetition model that is appropriate for use with voice assistants (VAs). We also highlight areas of the Alexa development process that limit the facilitation of language learning, namely the lack of multilingual speech recognition, and offer solutions to these current limitations. Overall, the investigation shows that Alexa can successfully facilitate simple L2-L1 flashcard-based language learning and highlights the potential for Alexa to be used as a sophisticated and effective language learning tool

    Vocal interactivity in crowds, flocks and swarms : implications for voice user interfaces

    Get PDF
    Recent years have seen an explosion in the availability of Voice User Interfaces. However, user surveys suggest that there are issues with respect to usability, and it has been hypothesised that contemporary voice-enabled systems are missing crucial behaviours relating to user engagement and vocal interactivity. However, it is well established that such ostensive behaviours are ubiquitous in the animal kingdom, and that vocalisation provides a means through which interaction may be coordinated and managed between individuals and within groups. Hence, this paper reports results from a study aimed at identifying generic mechanisms that might underpin coordinated collective vocal behaviour with a particular focus on closed-loop negative-feedback control as a powerful regulatory process. A computer-based real-time simulation of vocal interactivity is described which has provided a number of insights, including the enumeration of a number of key control variables that may be worthy of further investigation

    Brain-computer interface technology for speech recognition: A review

    Get PDF
    This paper presents an overview of the studies that have been conducted with the purpose of understanding the use of brain signals as input to a speech recogniser. The studies have been categorised based on the type of the technology used with a summary of the methodologies used and achieved results. In addition, the paper gives an insight into some studies that examined the effect of the chosen stimuli on brain activities as an important factor in the recognition process. The remaining part of this paper lists the limitations of the available studies and the challenges for future work in this area

    Learning temporal clusters using capsule routing for speech emotion recognition

    Get PDF
    Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirectional long short-term memory network (BLSTM), CNN and Capsule networks. The BLSTM deals with the temporal dynamics of the speech signal by effectively representing forward/backward contextual information while the CNN along with the dynamic routing of the Capsule net learn temporal clusters which altogether provide a state-of-the-art technique for classifying the extracted patterns. The proposed approach was compared with a wide range of architectures on the FAU-Aibo and RAVDESS corpora and remarkable gain over state-of-the-art systems were obtained. For FAO-Aibo and RAVDESS 77.6% and 56.2% accuracy was achieved, respectively, which is 3% and 14% (absolute) higher than the best-reported result for the respective tasks
    • …
    corecore